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Abstract 

We first introduce a class of divergence measures between power spectral density matrices. These are derived by 
comparing the suitability of different models in the context of optimal prediction. Distances between "infinitesimally 
close" power spectra are quadratic, and hence, they induce a differential-geometric structure. We study the corre- 
sponding Riemannian metrics and, for a particular case, provide explicit formulae for the corresponding geodesies 
and geodesic distances. The close connection between the geometry of power spectra and the geometry of the 
Fisher-Rao metric is noted. 

I. Introduction 



r \ ', Distance measures between statistical models and between signals constitute some of the basic tools of Signal 



Processing, System Identification, and Control HI, lUl- Indeed, quantifying dissimilarities is the essence of detection, 
tracking, pattern recognition, model validation, signal classification, etc. Naturally, a variety of choices are readily 
available for comparing deterministic signals and systems. These include various Lp and Sobolev norms on signal 
spaces, and induced norms in spaces of systems. Statistical models on the other hand are not elements of a linear 
space. Their geometry is dictated by positivity constraints and hence, they lie on suitable cones or simplices. This 
is the case for covariances, histograms, probability distributions, or power spectra, as these need to be positive in 
a suitable sense. A classical theory for statistical models, having roots in the work of C.R. Rao and R.A. Fisher, 
is now known as "information geometry" 131, lH, ||5l, ||6l. The present work aims at a geometric theory suitable 
■^' [ for time-series modeled by power spectra. To this end, we follow a largely parallel route to that of information 
CO ■ geometry (see IH) in that a metric is now dictated by the dissimilarity of models in the context of prediction theory 
for second-order stochastic processes. The present work builds on ||7l, which focused on scalar time-series, and is 
r^ " devoted to power spectral densities of multivariable stochastic processes. 

The need to compare two power spectra densities /i , /2 directly has led to a number of divergence measures 
which have been suggested at various times ID, ||2l. Key among those are the Itakura-Saito distance 

C^ ■ and the logarithmic spectral deviation 
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see e.g., ||2l page 370]. The distance measures developed in ||7l are closely related to both of these, and the 
development herein provides a multivariable counterpart. Indeed, the divergences that we list between matrix- 
valued power spectra are similar to the Itakura-Saito divergence and geodesies on the corresponding Riemannian 
manifolds of power spectra take the form of logarithmic integrals. 

Distances between multivariable power spectra have only recently received any attention. In this direction we 
mention generalizations of the Hellinger and Itakura-Saito distances by Ferrante et al. HI, lO and the use of the 
Umegaki-von Neumann relative entropy [10]. The goal of this paper is to generalize the geometric framework in 
Q to the matrix-valued power spectra. We compare two power spectra in the context of linear prediction: a choice 
between the two is used to design an optimal filter which is then applied to a process corresponding to the second 
power spectrum. The "flatness" of the innovations process, as well as the degradation of the prediction error variance. 
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when compared to the best possible, are used to quantify the mismatch between the two. This rationale provides us 
with natural divergence measures. We then identify corresponding Riemannian metrics that dictate the underlying 
geometry. For a certain case we compute closed-form expressions for the induced geodesies and geodesic distances. 
These provide a multivariable counterpart to the logarithmic intervals in Q and the logarithmic spectral deviation 
111 page 370]. It is noted that the geodesic distance has certain natural desirable properties; it is inverse-invariant 
and congruence-invariant. Moreover, the manifold of the multivariate spectral density functions endowed with this 
geodesic distance is a complete metric space. A discrete counter part of certain of these Riemannian metrics, on 
the manifold of positive definite matrices (equivalent to power spectra which are constant across frequencies), has 
been studied extensively in connection to the geometry of positive operators [1 1 ] and relates to the Rao-Fisher 
geometry on probability models restricted to the case of Gaussian random vectors. 

Indeed, there is a deep connection between the Itakura-Saito distance and the KuUback-Leibler divergence 
between the corresponding probability models E page 371], |[T2l which provides a link to information geometry. 
Hence, the Riemannian geometry on power spectral densities in Q as well as the multivariable structure presented 
herein is expected to have a strong connection also to the Fisher-Rao metric and the geometry of information. 
An interesting study in this direction which taps on an interpretation of the geometry of power spectra via the 
underlying probability structure and its connection to the KuUback-Leibler divergence is given in Yu and Mehta 
||T3l . However, a transparent differential geometric explanation which highlights points of contact is still to be 
developed. Further key developments which parallel the framework reported herein and are focused on moment 
problems are presented in ||8l, ||9l. 

The paper is organized as follows. In Section JI] we establish notation and overview the theory of the multivariate 
quadratic optimal prediction problem. In Section JII] we introduce alternative distance measures between multivari- 
able power spectra which reflect mismatch in the context of one-step-ahead prediction. In Section JV] we discuss 
Riemannian metrics that are induced by the divergence measures of the previous section. In Section |V] we discuss 
the geometry of positive matrices. In Section |Vl] the geometric structure is analyzed and geodesies are identified. In 
Section IVIII we provide examples to highlight the nature of geodesies between power spectra and how these may 
compare to alternatives. 

II. Preliminaries on Multivariate Prediction 

Consider a multivariate discrete-time, zero mean, weakly stationary stochastic process {u(A:), A: € Z} with u(A;) 
taking values in C™^^. Throughout, boldface denotes random variables/vectors, 8 denotes expectation, j = ^/—l 
the imaginary unit, and * the complex conjugate transpose. Let 

Rk = £ {u(^)u*(£ - k)} for /, A; G Z 

denote the sequence of matrix covariances and d^{9) be the corresponding matricial power spectral measure for 
which 



Rk 



^ikeMfl 
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For the most part, we will be concerned with the case of non-deterministic processes with an absolutely continuous 
power spectrum. Hence, unless we specifically indicate otherwise, dii{9) = f{9)d0 with f{6) being a matrix-valued 
power spectral density (PSD) function. Further, for a non-deterministic process \og{f{6)) needs to be integrable, 
and this will be assumed throughout as well. 

Our interest is in comparing PSD's and in studying possible metrics between such. The evident goal is to 
provide a means to quantify deviations and uncertainty in the spectral domain in a way that is consistent with 
particular applications. More specifically, we present metrizations of the space of PSD's which are dictated by 
optimal prediction and reflect dissimilarities that have an impact on the quality of prediction. 

A. Geometry of multivariable processes 

We will be considering least-variance linear prediction problems. To this end, we define L2,u to be the closure 
of ?Ti X 1-vector-valued finite linear combinations of {u(A;)} with respect to covergence in the mean |fl4-, pg. 135]: 



^2,u := < Y. Pkui-k) : Pk G C"><-, keZ 

\ finite > 
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Here, "bar" denotes closure. The indices in P^ and u.{—k) run in opposite directions so as to simplify the notation 
later on where prediction is based on past observations. This space is endowed with both, a matricial inner product 

lY,Pkui-k),^Qkni-k)J := 
k k 

as well as a scalar inner product 

{^Pku{-k),^Qku{-k)):= 
k k 

trlY, Pkn{-k),Y,Qk^{-k)J. 
k k 

Throughout, "tr" denotes the trace of a matrix. It is standard to establish the correspondence between 

p :=p(u) := ^Pfcu(-fc) and 

k 
k 

with z = eJ^ for 9 G [— 7r,7r]. This is the Kolmogorov isomorphism between the "temporal" space L2(u) and 
"spectral" space L2,rf^, 

if : L2(u)^L2,<i^ : ^PkVi{-k) ^^PkZ^ . 

k k 

It is convenient to endow the latter space L2,d/i with the matricial inner product 

as well as the scalar inner product 

The additional structure due to the matricial inner product is often referred to as Hilbertian (as opposed to Hilbert) 

na. 

Throughout, p{e^^) = J2k Pke^''^ , q{e^^) = '^kQk^'^^^ ^ where we use lower case p,q for matrix functions and 
upper case Pk, Qk for their matrix coefficients. For non-deterministic processes with absolutely continuous spectral 
measure dn{9) = f{9)d6, we simplify the notation into 

[P,?]/ := lP,qJfde, and 
{p,Q)f ■= {p,Q)fde- 
Least-variance linear prediction 

min |trf{pp*} : p = u(0) - J]P,u(-A:), Pk G C"^™ I (1) 

I fc>o J 

can be expressed equivalently in the spectral domain 

mill I lp,pjj : p{z) =I-Y,Pkz\Pk^ C'"^'" I (2) 

I fc>o J 

where the minimum is sought in the positive-definite sense, see |[T5l pg. 354], |rT4l pg. 143]. We use "/" to denote 
the identity matrix of suitable size. It holds that, although non-negative definiteness defines only a partial order on 
the cone of non-negative definite Hermitian matrices, a minimizer for ([T|) always exists. Of course this corresponds 
to a minimizer for Q. The existence of a minimizer is due to the fact that tr<S{pp*} is matrix-convex. Here 
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dfi = fdO is an absolutely continuous measure and the quadratic form is not degenerate; see |[T6l Proposition 1] 
for a detailed analysis and a treatment of the singular case where ^ is a discrete matrix-valued measure. Further, 
the minimizer of ([T) coincides with the minimizer of 

mill I {p,p)f : p{z) =I-Y1 Pkz", Pk G C™^"^ [ • (3) 

From here on, to keep notation simple, p{z) will denote the minimizer of such a problem, with / specified 
accordingly, and the minimal matrix of ([1]) will be denoted by Q. That is, 

n:=lp,pjf 

while the minimal value of (O is tr Q. The minimizer p is precisely the image under the Kolmogorov isomorphism 
of the optimal prediction error p and Vt the prediction-error variance. 

B. Spectral factors and optimal prediction 

For a non-deterministic process the error variance Q has full rank. Equivalently, the product of its eigenvalues 
is non-zero. The well-known Szego-Kolmogorov formula |[T5l pg. 369] 

/TT JO 

logdetf{9)—} (4) 

relates the product of the eigenvalues of the optimal one-step-ahead prediction error variance with the corresponding 
PSD. No expression is available in general that would relate f to Q directly in the matricial case. 
We consider only non-deterministic processes and hence we assume that 

logdet/(0)GLi[-7r,7r]. 

In this case, f{9) admits a unique factorization 

fi9) = Uie^')Uie^'r, (5) 

with /+(eJ^) G ?^^^'"(B), 

det(/+(2;)) / in B := {z : \z\ < 1}, 

and normalized so that /+(0) = Q,^. Throughout, M2 denotes the Hermitian square root of a Hermitian matrix M. 
The factor /+ is known as the canonical (left) spectral factor. In the case where / is a scalar function (m = 1) 
the canonical spectral factor is explicitly given by 

As usual, 7^2 0^) denotes the Hardy space of functions which are analytic in the unit disk D with square-integrable 
radial limits. Spectral factorization presents an "explicit" expression of the optimal prediction enor in the form 

p{z) = U{0)f^\z). (6) 

Thus, p{z)~^ is a "normalized" (left) outer factor of /. The terminology "outer" refers to a (matrix-valued) function 
g{e^^) for 6 G [— vr, vr] that can be extended into an analytic function in the open interior of the unit disc E) which 
is also invertible in D. It is often standard not to differentiate between such a function in ID and the function on 
the boundary of radial-limits since these are uniquely defined from one another. In the engineering literature outer 
functions are also referred to as "minimum phase." Right-outer factors, where f{6) = /+,right(e^^)*/+,right(e-'^) 
instead of (|5]l relate to a post-diction optimal estimation problem; in this, the present value of the process is 
estimated via linear combination of future values (see e.g., |[T6l ). Only left factorizations will be used in the present 
paper. 
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III. Comparison OF PSD's 

We present two complementing viewpoints on how to compare two PSD's, /i and /2. In both, the optimal 
one-step-ahead predictor for one of the two stochastic processes, is applied to the other and compared to the 
corresponding optimal. The first is to consider how "white" the power spectrum of the innovations' process is. 
The second viewpoint is to compare how the error variance degrades with respect to the optimal predictor. Either 
principle provides a family of divergence measures and a suitable generalization of the Riemannian geometry of 
scalar PSD's given in Q. There is a close relationship between the two. 

A. Prediction errors and innovations processes 

Consider two matrix-valued spectral density functions /i and /2. Since an optimal filter will be designed based 
on one of the two and then evaluated with respect to the other, some notation is in order. 

First, let us use a subscript to distinguish between two processes v^{k), i G {1,2}, having the /j's as the 
corresponding PSD's. They are assumed purely nondeterministic, vector-valued, and of compatible size. The optimal 
filters in the spectral domain are 

Pi := argmin{[p,p]j^ p(0) = /, 
andpG^^^™(B)}, 

and their respective error covariances 

Now define 

^ij ■= lPj,PjJf,- 

Clearly, Qij is the variance of the prediction error when the filter pj is used on a process having power spectrum 
fi. Indeed, if we set 

p,j := u,(0) - Pj,iu,(-1) - Pj,2U,(-2) - . . . (7) 

the prediction-error covariance is 

lPi,j,Pi,jJ = IPjiPjJfr 

The prediction error pij can also be thought of as a time-process, indexed at time-instant /c G Z, 

Pij(k) := Ui{k) - Pj,iUi{k - 1) - Pj,2Ui{k - 2) - . . . 
for i,j G {1,2}. This is an innovations process. Clearly, from stationarity, 

lPi,i,Pi,iJ = ^i, 
whereas 

lPi,j-,Pi,jJ > ^J, 

since in this case pj is suboptimal for Uj, in general. 

B. The color of innovations and PSD mismatch 

We choose to normalize the innovations processes as follows: 

hij{k) = Qj ^pij{k), for /c G Z. 

The Kolmogorov isomorphism takes 

If : hij{k) ^ fj_l, 

with the expectation/inner-product being that induced by /«, and hence, the power spectral density of the process 

h.ij{k) is 
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where (•)"* is a shorthand for ((O*)"^- When /j = fj, evidently {h^'*} is a white noise process with covariance 
matrix equals to the identity. 

Naturally, in an absolute sense, the mismatch between the two power spectra fi, fj can be quantified by the 
distance of /h,j to the identity. To this end we may consider any symmetrized expression: 

£ ^ifi-l^H^ I)^ + £ d(/-+V,/-+ , I)^ (8) 

for a suitable distance d(-, •) between positive definite matrices. In general, it is deemed desirable that distances 
between power spectra are invariant to scaling (as is the case when distances depend on ratios of spectra, ||2l). 
Researchers and practitioners alike have insisted on such a property, especially for speech and image systems, due 
to an apparent agreement with subjective qualities of sound and images. It is thus interesting to seek a multivariable 
analogues inherent in the above comparison. 

Due to the non-negative definiteness of power spectra, a convenient option is to take "d" as the trace: 

£f {fnf^fj: - ^) + tr (/r;/./r+ - 1) ^- 

This indeed defines a distance measure since (x + x~^ — 2) is a non-negative function for < x € M that vanishes 
only when x = 1. Thus, we define 

tr(/2-Vi + /rV2-2/)— . (9a) 

Interestingly, Di(/i,/2) can be re-written as follows: 

Dl(/l, h) = fjf;"'fl" - fl"f^"'\\lr'!^ (9b) 

where ||M||pj, := tr MM* denotes the square of the Frobenius nomu- It can be readily verified starting from the 
right hand side of ( |9bl ) and simplifying this to match ( |9at . It is now be easily seen that Di(/j, fj) has a number of 
desirable properties listed in the following proposition. 

Proposition 1: Consider fi,fj being PSD's of non-deterministic processes and g{e^^) an arbitrary outer matrix- 
valued function in ^"""""(D). The following hold: 



(i) Di(/, 

(ii) Di(/, 

(iii) Di(/, 

(iv) Di(/, 



/.) > 0. 

fj) = iff fi = fj (a.e.). 
fj) = Bi{fj,fi). 
„/,) = Di(/ri,/-i). 



(V) Di{f„fj) = Bi{gfig*,gfjg*). 

Proof: Properties (i-iv) follow immediately from ( |9bl l while the invariance property (v) is most easily seen by 
employing (|9al ). ■ 



C. Suboptimal prediction and PSD mismatch 

We now attempt to quantify how suboptimal the performance of a filter is when this is based on the incorrect 
choice between the two alternative PSD's. To this end, we consider the error covariance and compare it to that of the 
optimal predictor. A basic inequality between these error covariances is summarized in the following proposition. 

Proposition 2: Under our earlier standard assumptions, for i,j G {1,2} and Qi,Qj > 0, it holds that 

Qij > Qi. (lOa) 

Further, the above holds as an equality iff k = Pj. 
Proof: It follows from the optimality of pi since 

lPj,PjJf, > lpi,PiJf^ = ^i- 
'Vtr Af M* is also referred to also as the Hilbert-Schmidt norm. 
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Corollary 3: The following hold: 

(10b) 
(10c) 
(lOd) 

Qj^nijnj'' > n^^niQjK (lOe) 

Further, each ">" holds as equality iff pi = pj. 

Thus, a mismatch between the two spectral densities can be quantified by the strength of the above inequalities. 
To this end, we may consider a number of alternative "divergence measures". First we consider: 



^• 


■^n,^^n^ ^ 


> 


I 




detiQij) 


> 


det(^, 




tviQij) 


> 


tr(a) 



B2{fi,fj) := logdetln^-QijQ^^). (11) 

Equivalent options leading to the same Riemannian structure are: 

1 _i _i 

— tr(a 'Oi,a ') - 1, and (12a) 

det(f^r^J7.^^.0r2) - 1. (12b) 

Using the generalized Szego-Kolmogorov expression @ we readily obtain that 

0,U..fi) = logdet(y_y-;/,/-;|^)-£logdet(/-_V./-;)^ (13) 

This expression takes values in [0, oo], and is zero if and only if the normalized spectral factors p^^ = 0^^/^/+ 
are identical for the two spectra. Further, it provides a natural generalization of the divergence measures in ||7l and 
of the Itakura distance to the case of multivariable spectra. It satisfies "congruence invariance." This is stated next. 
Proposition 4: Consider two PSD's fi, fj of non-deterministic processes and g{e^^) an outer matrix-valued 
function in ?^^^™(D). The following hold: 

(i) D2(/^,/j)>0. 

(ii) D2(/^,/j) = 0iffp. =pj. 
(iii) D2(/„/,) = D2(5/i5*,5/i5*)- 

Proof: Properties (i-ii) follow immediately from (ITTI ) while the invariance property (iii) is most easily seen be 
employing (|T3] ). To this end, first note that (7/+ obviously constitutes the spectral factor of gfg*. Substituting the 
corresponding expressions in (IT3] ) establishes the invariance. ■ 

D. Alternative divergence measures 

Obviously, a large family of divergence measures between two matrix-valued power spectra can be obtained 
based on ([Hi. For completeness, we suggest representative possibilities some of which have been independently 
considered in recent literature. 

1) Frobenius distance: If we use the Frobenius norm in ([8]l we obtain 

Df(/i, /2) := 2 E / 11//+ /./-; - I\\l^ (14a) 

id ^ 

where ^^ designates the "symmetrized sum" taking (i, j) G {(1, 2), (2, 1)}. It's straightforward to see that all of 

/7+/i/~+*, f'hifj'^ and /ri/i 
share the same eigenvalues for any 9 G [— vr, vr]. Thus, 
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and 

Df(/i, /2) = 2 E / ll/7'/^/7' - ^llFr^- (14b) 

Obviously (I14bl ) is preferable over (I14al ) since no spectral factorization is involved. 

2) Hellinger distance: A generalization of the Hellinger distance has been proposed in Q for comparing 
multivariable spectra. Briefly, given two positive definite matrices /i and /2 one seeks factorizations /j = gig* 
so that the integral over frequencies of the Frobenius distance \\gi — g2\\Yi between the factors is minimal. The 
factorization does not need to correspond to analytic factors. When one of the two spectra is the identity, the 
optimization is trivial and the Hellinger distance becomes 

II ,.i Tii2 d,9 

A variation of this idea is to compare the normalized innovation spectra {f~_^fif^l)^, for i,j € {1,2}, to the 
identity. We do this in a symmetrized fashion so that together with symmetry the metric inherits the inverse- 
invaiiance property. Thus, we define 

Dh(/i,/2):=J; TlK/^+V./^;)^ -/||^r^ (15) 

ij ■'- 

The second equality follows by the fact that fj+fj ^ is a frequency-dependent unitary matrix. 

3) Multivariable Itakura-Saito distance: The classical Itakura-Saito distance can be readily generalized by taking 

d(/,/)=tr(/-log/-/). 
The values are always positive for / 7^ / > and equal to zero when f = I. Thus, we may define 

d(/2+Vi/2;M)— (16) 



/TT JO 

(tr(/2-Vi)-logdet(/2-Vi)-m)|^. 



The Itakura-Saito distance has its origins in maximum likelihood estimation for speech processing and is related to 
the KuUback-Leibler divergence between the probability laws of two Gaussian random processes ||2l, |[T2l . More 
recently, lH introduced the matrix-version of the Itakura-Saito distance for solving the state-covariance matching 
problem in a multivariable setting. 

4) Log-spectral deviation: It has been argued that a logarithmic measure of spectral deviations is in agreement 
with perceptive qualities of sound and for this reason it has formed the basis for the oldest distortion measures 
considered 111. In particular, the L2 distance between the logarithms of power spectra is referred to as "Log-spectral 
deviation" or the "logarithmic energy." A natural multivariable version is to consider 

d(/,/) = ||log(/)|||,. 

This expression is already symmetrized, since d(/, /) = d(/~^,/) by virtue of the fact that the eigenvalues of 
log(/) and those of log(/~^) differ only in their sign. Thereby, 

l|iog(//+V./7;)iii = i|iog(/r^V,/H-*)lllr- 

Thus we define 

jiiog(/r;/2/r;)iilr^ dv) 

/T _i _i tif) 

jiiog(/rv2/r')ii^r^- 

This represents a multivariable version of the log-spectral deviation (see [2^ page 370]). Interestingly, as we will 
see later on, T>Log{fi^ /2) possesses several useful properties and, in fact, its square root turns out to be precisely 
a geodesic distance in a suitable Riemannian geometry. 
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IV. RiEMANNIAN STRUCTURE ON MULTIVARIATE SPECTRA 

Consider a "small" perturbation / + A away from a nominal power spectral density /. All divergence measures 
that we have seen so far are continuous in their arguments and, in-the-small, can be approximated by a quadratic 
form in A which depends continuously on /. This is what is referred to as a Riemannian metric. The availability of 
a metric gives the space of power spectral densities its properties. It dictates how perturbations in various directions 
compare to each other. It also provides additional important concepts: geodesies, geodesic distances, and curvature. 
Geodesies are paths of smallest length connecting the start to the finish; this length is the geodesic distance. Thus, 
geodesies in the space of power spectral densities represent deformations from a starting power spectral density 
/o to an end "point" /i. Curvature on the other hand is intimately connected with approximation and convexity of 
sets. 

In contrast to a general divergence measure, the geodesic distance obeys the triangular inequality and thus, it is a 
metric (or, a pseudo-metric when by design it is unaffected by scaling or other group of transformations). Geodesies 
are also natural structures for modeling changes and deformations. In fact, a key motivation behind the present 
work is to model time-varying spectra via geodesic paths in a suitable metric space. This viewpoint provides a 
non-parametric model for non-stationary spectra, analogous to a spectrogram, but one which takes into account the 
inherent geometry of power spectral densities. 

Thus, in the sequel we consider infinitesimal perturbations about a given power spectral density function. We 
explain how these give rise to nonnegative definite quadratic forms. Throughout, we assume that all functions are 
smooth enough so that the indicated integrals exist. This can be ensured if all spectral density functions are bounded 
with bounded derivatives and inverses. Thus, we will restrict our attention to the following class of PDF's: 

F := {f \ m X m positive definite, differentiable 
on [— TT, vr], with continuous derivative}. 

In the above, we identify the end points of [— vr, tt] since / is thought of as a function on the unit circle. Since the 
functions / are strictly positive definite and bounded, tangent directions of T consists of admissible perturbations 
A. These need only be restricted to be differentiable with square integrable derivative, hence the tangent space at 
any f ^ T can be identified with 

V := {A I differentiable on [— vr,7r] 
with continuous derivative}. 

A. Geometry based on the "flatness " of innovations spectra 

We first consider the divergence Di in (|9aH9b| ) which quantifies how far the PSD of the normalized innovations 
process is from being constant and equal to the identity. The induced Riemannian metric takes the form 

d9 

Proposition 5: Let (/, A) G J^ x 2? and e > 0. Then, for e sufficiently small, 

Di(/,/ + eA) = gi,^(6A) + 0(63). 

Proof: First note that 

tr (/(/ + eA)-i) = tr [f^l\l + f-'/^M-''')-' T''') 
= tr(/ + rV2,Ari/2^ 

tr (/(/ + eA)-i) =m- tiif~'/hAf~'/^) 

Likewise, 

tr(/ + eA)/-^ = m + tr(eA/~^) 



i,/(A) := / \\f~'/'Af-'/Xr^- (18a) 
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Therefore, 

de 

2^ 



Di(/, /+eA) = tr f {f{f + eA)-' + {f + eA)f-'-2l) 

J — n 



Obviously, an alternative expression for gij that requires neither spectral factorization nor the computation of 
the Hermitian square root of /, is the following: 

tr(riAr'A)— . (18b) 

It is interesting to also note that any of (fT4l) . (175] ). (f76l ). and ([TtI ) leads to the same Riemannian metric. 

B. Geometry based on suboptimality of prediction 

The paradigm in Q for a Riemannian structure of scalar power spectral densities was originally built on the 
degradation of predictive error variance, as this is reflected in the strength of the inequalities of Proposition |2l In this 
section we explore the direct generalization of that route. Thus, we consider the quadratic form which F inherits 
from the relevant divergence D2, defined in (fTTI) . The next proposition shows that this defines the corresponding 
metric: 



,(A) :. t.£(/-A/r)^|-t.(/V'A/r|) 



g,,(A)-lr(/ /;'A/r^)^ (19) 



Proposition 6: Let (/, A) G J^ x 2? and e > 0. Then, for e sufficiently small, 

D2(/,/ + eA) = ig2j(eA) + 0(e3). 

Proof: In order to simplify the notation let 

A, := Z+^eA/;*. 

Since A,/ are both bounded, |tr(A^)| = 0{€^) as well as | tr(/^^Ae|^)^| = O(e^). Using a Taylor series 
expansion. 



trlog(/+ / A,— 



2 



d'G\ 1... f r A dO\ , ^, 3 



while 



Thus 



^^(/ ^^2^j-2*^a>2^j +^^^^)' 



tv(^J\og{f^Hf + eA)f-*)^^ 
= j trlog(/ + A,) — 
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Evidently, g2j and gij are closely related. The other choices of D similarly yield either gij, as noted earlier, 
or g2 J . In fact, g2j can be derived based on (fT2l) . 

We remark a substantial difference between g^ r and g2 t. In contrast to gg r, evaluation of g^ r does not require 
computing /+. However, on the other hand, both g^^ j, and g2 j are similarly unaffected by consistent scaling of / 

and A. 

V. Geometry on positive matrices 

As indicated earlier, a Riemannian metric g(A) on the space of Hermitian m x m matrices is a family of 
quadratic forms originating from inner products that depend smoothly on the Hermitian "foot point" M — the 
standard Hilbert-Schmidt metric gjjg(A) = (A, A) := tr(A^) being one such. Of particular interest are metrics 
on the space of positive definite matrices that ensure the space is complete and geodesically completed For our 
purposes, matrices typically represent covariances. To this end a standard recipe for constructing a Riemannian 
metric is to begin with an information potential, such as the Boltzmann entropy of a Gaussian distribution with 
zero mean and covariance M, 

S{M) := --log(det(M)) + constant, 
and define an inner product via its Hessian 



The Riemannian metric so defined. 



gA/(A) : = tr(M-^AM-^A) 
= ||M~2AM~2||2^, 

is none other than the Fisher-Rao metric on Gaussian distributions expressed in the space of the corresponding 
covariance matrices. 

The relationship of the Fisher-Rao metric on Gaussian distributions with the metric gij in (llSbl ) is rather evident. 
Indeed, g^/ coincides with g^ j for power spectra which are constant across frequencies, i.e., taking / = M to be 
a constant Hermitian positive definite matrix. 

It is noted that g]\j{A) remains invariant under congruence, that is. 



U/ 



{A) = gTMT'{TAT*) 



for any square invertible matrix-function T. This is a natural property to demand since it implies that the distance 
between covariance matrices does not change under coordinate transformations. The same is inherited by g^ j for 
power spectra. It is for this reason that g^/ has in fact been extensively studied in the context of general C* -algebras 
and their positive elements; we refer to iflTl pg. 201-235] for a nice exposition of relevant material and for further 
references. Below we highlight certain key facts that are relevant to this paper. But first, and for future reference, 
we recall a standard result in differential geometry. 

Proposition 7: Let A^ be a Riemannian manifold with ||A|||^ denoting the Riemannian metric at Af G 7W and 
A a tangent direction at M. For each pair of points Mq, Mi G A4 consider the path space 

©Mo, Ml := {Mr : [0,1] ^ M : Mr is a piecewise smooth 
path connecting the two given points}. 

Denote by Mr := dMr/dr. The arc-length 

/■I 

WMrhldr, 



2 



A space is complete when Cauchy sequences converge to points in the space. It is geodesically complete when the definition domain of 
geodesies extends to the complete real line R; i.e., extrapolating the path beyond the end points remains always in the space. 
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as well as the "action/energy" functional 



\Mr\\lldT 



/o 

attain a minimum at a common path in Qfoj^- Further, the minimal value of the arclength is the square root of 
the minimal value of the energy functional, and on a minimizing path the "speed" ||Mt-||a/ remains constant for 

TG [0,1]. 

Proof: See HH pg. 137]. ■ 

The insight behind the statement of the proposition is as follows. The arclength is evidently unaffected by a re- 
parametrization of a geodesic connecting the two points. The "energy" functional on the other hand, is minimized 
for a specific parametrization of geodesic where the velocity stays constant. Thus, the two are intimately related. 
The proposition will be applied first to paths between matrices, but in the next section it will also be invoked for 
geodesies between power spectra. 

Herein we are interested in geodesic paths M^, r G [0, 1], connecting positive definite matrices Mq to Mi and 
in computing the corresponding geodesic distances 



dg(Mo,Mi) = I' \\M-'/^^M^'/%,dr. 



'0 

Recall that a geodesic M-,- is the shortest path on the manifold connecting the beginning to the end. 

Theorem 8: Given Hermitian positive matrices Mq , Mi the geodesic between them with respect to gj^/ is unique 
(modulo re-parametrization) and given by 

Mr = mI^'^{M~^''^MiMq^^'^YmI''^, (20) 

for < r < 1. Further, it holds that 

dg(Mo,MO = rdg(Mo,Mi), for r G [0, 1], 

and the geodesic distance is 

dg(Mo,Mi) = ||log(Afo-^/'MiM-^/2)||Fr. 

Proof: A proof is given in ifTTl Theorem 6.1.6, pg. 205]. However, since this is an important result for our 
purposes and for completeness, we provide an independent short proof relying on Pontryagin's minimum principle. 
We first note that, since gj^j is congruence invariant, the path TM^-T* is a geodesic between TMqT* and TMiT*, 
for any invertible matrix T. Further, the geodesic length is independent of T. Thus, we set 

T = M~', 

and seek a geodesic path between 

Xo = I and Xi = Mq^MiM~\ (21) 

Appealing to Proposition |7] we seek 

min{ / tr{X-^UrX-^Ur)dT, (22) 

Jo 

subject to Xr = Ur, and Xo,Xi specified}. 



Now, ((22]) is a standard optimal control problem. The value of the optimal control must annihilate the variation of 
the Hamiltonian with respect to the "control" Ur 

tT{X-^UrX-^Ur) + tl{ArUr). 

Here, A,- represents the co-state (i.e., Lagrange multiplier functions). The variation is 

tii2X;^ Ur X-^ 5u + ArSu) 
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and this being identically zero for all 6u implies that 

Ur = -^XrArXr. (23) 

Similarly, the co-state equation is obtained by considering the variation with respect to X. This gives 

A^ = 2X-^UrX-^UrX-\ 
Substitute the expression for Ur into the state and the co-state equations to obtain 

Xt = — — Xt-At-X^ 

Ar = -A^X^A^. 
Note that 

XrAr + XrK = 0, 

identically, for all r. Hence, the product X^A^ is constant. Set 

XrK = -2C. (24) 

The state equation becomes 

Xq- = Cy X-j-- 

The solution with initial condition Xq = / is 

X-r = exp(Cr). 

Matching dH) requires that exp(C) = Xi = M^ 'MiMg ^ Thus, Xr = (Mg 'MiAfg ''Y and the geodesic is as 
claimed. Further, 

C = log{M~^MiM~^) 

while Ut = CXt from (l24l ) and ( |23l ). So finally, for the minimizing choice of U^ we get that the cost 

/ tr{X-^UrX^^Ur)dT = / tr(C2)dT 
Jo JO 

= r||log(Afo-^/'AfiA/o"^/')||i, 

as claimed. ■ 

Remark 9: It's important to point out the lower bound 

dg(Aro, ATi) > II log Mo - log Ml \\f, (25) 

on the geodesic distance which holds with equality when Mq and Afi commute. This is known as the exponential 
metric increasing property iTiTl page 203] and will be used later on. n 

The mid point of the geodesic path in (l20l ) is what is known as the geometric mean of the two matrices Mq and 
Ml. This is commonly denoted by 

Ml := AfotjATi. 

2 

Similar notation, with the addition of a subscript r, will be used to designate the complete geodesic path 

Mr = Mo^rMi := Ml'\M~^/^MiM~^'^yMl'^ 

(see ifTTI ). A number of useful properties can be easily verified: 
i) Congruence invariance: for any invertible matrix T, 

dg(Aro,Mi) = dg{TMoT*,TMiT*). 

ii) Inverse invariance: 



dg(A^o,Mi) = dg(A^o-\Affi) 
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iii) The metric satisfies the semiparallelogram law. 

iv) The space of positive definite matrices metrized by dg is complete; that is, any Cauchy sequence of positive 

definite matrices converges to a positive definite matrix. 

v) Given any three "points" Mq, Mi, M2, 

dg(MoH,Mi,Mott.M2) < rdg(Mi,M2), 

which implies that geodesies diverge at least as fast as "Euclidean geodesies". 

Remark 10: Property v) impUes that the Riemannian manifold of positive definite matrices with metric dg has 
nonpositive sectional curvature ifTSl pg. 39-40]. The nonpositive sectional curvature of a simply connected complete 
Riemannian manifold has several important geometric consequences. It implies the existence and uniqueness of a 
geodesic connecting any two points on the manifold lITSl pg. 3-4]. Convex sets on such a manifold are defined by the 
requirement that geodesies between any two points in the set lie entirely in the set |[T8l pg. 67]. Then, "projections" 
onto the set exist in that there is always a closest point within convex set to any given point. Evidently, such a 
property should be valuable in applications, such as speaker identification or speech recognition based on a database 
of speech segments; e.g., models may be taken as the "convex hull" of prior sample spectra and the metric distance 
of a new sample compared to how far it resides from a given such convex set. Another property of such a manifold 
is that the center of mass of a set of points is contained in the closure of its convex hull |[T8l pg. 68] ; this property 
has been used to define the geometric means of symmetric positive matrices in |[T9l . D 

VI. Geodesics and geodesic distances 

Power spectral densities are families of Hermitian matrices parametrized by the frequency 6, and as such, can be 
thought of as positive operators on a Hilbert space. Geometries for positive operators have been extensively studied 
for some time now, and power spectral densities may in principle be studied with similar tools. However, what it 
may be somewhat surprising is that the geometries obtained earlier, based on the innovations flatness and optimal 
prediction, have points of contact with this literature. This was seen in the correspondence between the metrics that 
we derived. 

In the earlier sections we introduced two metrics, g]^ and g2. Although there is a close connection between the 
two, as suggested by (|T9l ). it is only for the former that we are able to identify geodesics and compute the geodesic 
lengths, based on the material in Section |Vl We do this next. 

Theorem 11: There exists a unique geodesic path /,- with respect to gij, connecting any two spectra /o, /i € F. 
The geodesic path is 

fr = fl'\fo^'^ hf^^'^Y fl'\ (26) 

for < T < 1. The geodesic distance is 




dg,(/o,/i) = W/ ||log/o~''7i/o 



1/2^ ^-l/2||2 ^ 



Proof: As before, in view of Proposition |7j instead of the geodesic length we may equivalently consider 
minimizing the energy/action functional 






Clearly, this can be minimized point- wise in 9 invoking Theorem [8] Now, inversion as well as the fractional power 
of symmetric (strictly) positive matrices represent continuous and differentiable maps. Hence, it can be easily seen 
that, because /o, /i are in T so is 

f _ f 1/2/ j?-l/2 /. ^— 1/2nt- f 1/2 

JT ~ Jo Uo J^Jo ) •'0 ■ 
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Therefore, this path is the sought minimizer of 

Jo 
and the geodesic length is as claimed. ■ 

Corollary 12: Given any /o, /i, /2 € T, the function dg^(/otir/i, /ottT/2) is convex on r. 

Proof: The proof is a direct consequence of the convexity of the metric dg{-,-). ■ 

The importance of the statement in the corollary is that the metric space has nonpositive curvature. Other properties 
are similarly inherited. For instance, dg satisfies the semi-parallelogram law. 

Next we explain that the closure of the space of positive differentiable power spectra, under g^^, is simply power 
spectra that are squarely log integrable. This is not much of a surprise in view of the metric and the form of the 
geodesic distance. Thus, the next proposition shows that the completion, denoted by "bar," is in fact 

T := {f \ m X m positive definite a.e., 

on [-7r,7r], log/ E L2[-7r,7r]}. (27) 

It should be noted that the metric dg^ is not equivalent to an L2-based metric || log(/i) — log(/2)||2 for the space. 
Here, 

11^112:= 

In fact, using the latter T has zero curvature while, using d^^, T becomes a space with non-positive (non-trivial) 
curvature. 

Proposition 13: The completion of T under dg is as indicated in ( [27] ). 

Proof: Clearly, for / G J^, log / G L2[— vr, vr] since / is continuous on the closed interval and positive definite. 
Further, the logarithm maps positive differentiable matrix-functions to positive differentiable ones, bijectively. Our 
proof of J^ being the completion of T is carried out in three steps. First we will show that the limit of every 
Cauchy sequence in T belongs to T. Next we argue that every point in T is the limit of a sequence in T, which 
together with the first step shows that T is dense in T. Finally, we need to show that T is complete with dg^ . 

First, consider a Cauchy sequence {/n} in T which converges to /. Hence, there exists an N, such that for any 
k > N, dg (/fc, /) < 1. Using the triangular inequality for dg , we have that 

dg,(/,/)<dg^(/,/;v)+dg^(/^,/), 

or, equivalently, 

IUog/||2 < ||log/Ar||2 + l. 

Since || log/Ar||2 is finite, f ^ T. 

Next, for any point / in ^ which is not continuous, we show that it is the limit of a sequence in T . Let 
h = log/, then h G L2[— 7r,7r]. Since the set of differentiable functions C'^[— 7r,7r] is dense in L2[— vr,7r], there 
exits a sequence {/i„ G C^[— 7r,7r]} which converges to h in the L2 norm. Using Theorem 3 in (SO;, pg. 86], there 
exists a subsequence {hn^} which converges to h almost everywhere in [— 7r,7r], i.e., 

\\hn^{0) — h{9)\\Fj- ^ a.e., as n^ — ;> 00. 

Since the exponential map is continuous 11271 pg. 430], We^^^^W — e'^(0)||Fr converges to almost everywhere as 
well. Using the sub-multiplicative property of the Frobenius norm, we have that 

\\1 — e ^ 'e ''^ 'llFr S ||e ^ ||Fr||C ~ 6 (t/JllFr, 

where the right side of the above inequahty goes to zero. Thus the spectral radius of (/ — e~^^^^e^"i'^^^) goes to 
zero |[22l pg. 297]. Hence, all the eigenvalues Ai(e~''^^^e'^"'»^^^), 1 < i < m, converge to 1 as /c — ;• cxo. Then, 
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/n, = e''"'= G T and 




dg,(/n.,/) = ^// l|logrl/Vn.rV2||2^|^ 



\ 



5]log2A,(/-l/n. 

-'^ j=l 



'2^ 



\ 






'2^- 




Since log Ai(e~'*e'^"'=) — )• a.e., for 1 < i < tti, dg^{fnk,f) — )• as well. Therefore, / is the limit of {/n&}- 

Finally we show that T is complete under dg^. Let {/«} be a Cauchy sequence in {T, dgj, and let /i„ = log /„. 
Using the inequality (|25] ). we have 



dgi(/fc,/z 



Thus {/in} is also a Cauchy sequence in L2[— 7r,7r], which is a complete metric space. As a result, {hn} converges 
to a point h in L2[— tt, tt]. Following the similar procedure as in the previous step, there exists a subsequence {fn^} 
which converges to / = e'^ € T. This completes our proof. ■ 

Remark 14: Geodesies of g2j for scalar power spectra were constructed in Q. At the present time, a multivari- 
able generalization appears to be a daunting task. The main obstacle is of course non-commutativity of matricial 
density functions and the absence of an integral representation of analytic spectral factors in terms of matrix-valued 
power spectral densities. In this direction we point out that some of the needed tools are in place. For instance, a 
square matrix-valued function which is analytic and non-singular in the unit disc B, admits a logarithm which is 
also analytic in ID. To see this, consider such a matrix-function, say f+{z). The matrix logarithm is well defined 
locally in a neighborhood of any zq G B via the Cauchy integral 



5(^) = ^/ HOici-uiz)r'dc 



Here, Lz^ is a closed path in the complex plane that encompasses all of the eigenvalues of f+{zo) and does not 
separate the origin from the point at 00. The Cauchy integral gives a matrix-function g{z) which is analytic in 
a sufficiently small neighborhood of zq in the unit disc D — the size of the neighborhood being dictated by the 
requirement that the eigenvalues stay within L^g, and e'xp{g{z)) = f+{z). To define the logarithm consistently over 
D we need to ensure that we always take the same principle value. This is indeed the case if we extend g{z) via 
analytic continuation: since /+(-z) is not singular anywhere in D and the unit disc is simply connected, the values 
for g{z) will be consistent, i.e., any path from zq to an arbitrary 2; € B will lead to the same value for g{z). Thus, 
one can set log(/+) = g and understand this to be a particular version of the logarithm. Similarly, powers of /+ 
can also be defined using Cauchy integrals. 



^/ r(a-/.w)-rfc 



for r G [0, 1], first in a neighborhood of a given zq G B, and then by analytic continuation to the whole of B. As 
with the logarithm, there may be several versions. Geodesies for g2 /■ appear to be require paths in the space of 
cannonical spectral factors for the corresponding matricial densities, such as /,-+ = /o+(/qI,_/i+)^. However, the 
correct expression remains elusive at present, n 

vn. Examples 

We first demonstrate geodesies connecting two power spectral densities that correspond to all-pole models, i.e., 
two autoregressive (AR) spectra. The geodesic path between them does not consist of AR-spectra, and it can be 
considered as a non-parametric model for the transition. The choice of AR-spectra for the end points is only for 
convenience. As discussed earlier, the aim of the theory is to serve as a tool in non-parametric estimation, path 
following, morphing, etc., in the spectral domain. 
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A scalar example: 

Consider the two power spectral denisities 



h{0) 



|a,(eJ^)P 



ie{o,i}, 



where 



ao =(z2 - 1.96cos(-) + 0.982)(z2 - 1.7cos(-) + 0.85^) 
5 3 

(z2-1.8cos(^) + 0.92), 

27]- 77]- 

ai ={z^ - 1.96cos(— ) + 0.982)(z2 - 1.5cos(— ) + 0.75^) 
15 30 

(z2-1.8cos(— ) + 0.92). 
8 

Their roots are marked by x's and o's respectively, in Figure |2l and shown with respect to the unit circle in the 
complex plane. We consider and compare the following three ways of interpolating power spectra between /q and 

/i- 




Fig. 1. Plots of log fo{e) (upper) and log fi{e) (lower) for e [0, vr]. 




-0.5 0.5 



Fig. 2. Locus of the roots of ari^z) for r G [0, 1]. 



First, a parametric approach where the AR-coefficient are interpolated: 

/r,AR(0) ^ 



(28a) 



with ar{z) = (1 — r)ao(z) + Tai{z). Clearly, there is a variety of alternative options (e.g., to interpolate partial 
reflection coefficients, etc.). However, our choice is intended to highlight the fact that in a parameter space, 
admissible models may not always form a convex set. This is evidently the case here as the path includes factors 
that become "unstable." The locus of the roots of ar{z) = for r G [0, 1] is shown in Figure |2] 
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Then we consider a linear segment connecting the two spectra: 

/r.lmcar = (1 " T")/o + T"/l- 



(28b) 



Again, this is to highlight the fact that the space of power spectra is not linear, and in this case, extrapolation 
beyond the convex linear combination of the two spectra leads to inadmissible function (as the path leads outside 
of the cone of positive functions). Finally, we provide the g^-geodesic between the two 



/r,gcodcsic — J0{ _e ) 

Jo 



(28c) 



We compare /^,ar, /T,imear and /r,geodesic for r e {i, |, |}. We first note that in plotting log/r,AR in Figured 
that /h AR is not shown since it is not admissible. Likewise log/T,iincar in Figure |4] breaks up for r = |, since 




Fig. 3. log fr,Anie) for r = i, f , I (blue), r = 0, 1 (red). 

f- linear bccomcs negative for a range of frequencies -dashed curve indicates the absolute value of the logarithm 
when this takes complex values. The plot of log /t geodesic is defined for all the r and shown in Figure |5l It is worth 




Fig. 4. log A,iinoar(6') f Or T = i , | , | (blue), T = 0, 1 (red). 




Fig. 5. log /^,goodcsic (^) for r = i, |, I (blue), r = 0, 1 (red). 



pointing out how two apparent "modes" in /r.Uncar and /r.gcodcsic are swapping their dominance, which does not 
occur when following /r,AR- 
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A multivariable example: 

Consider the two matrix-valued power spectral densities 



/o 



1 

O.leJ^ 1 



1 



1 O.le- 
1 



h 



1 O.leJ^ 
1 





1 



O.le 



-id 



Typically, these reflect the dynamic relationship between two time series; in turn these may represent noise 
input/output of dynamical systems or measurements across independent array of sensors, etc. The particular example 
reflects the typical effect of an energy source shifting its signature from one of two sensors to the other as, for 
instance, a possible scatterer moves with respect to the two sensors. 

Below /o and /i are shown in Fig. [6] and Fig. |7J respectively. Since the value of a power spectral density /, at 
each point in frequency, is a Hermitian matrix, our convention is to show in the (1,1), (1,2) and (2,2) subplots the 
log-magnitude of the entries /(1, 1), /(I, 2) (which is the same as /(2, 1)) and /(2,2), respectively. Then, since 
only /(1, 2) is complex (and the complex conjugate of /(2, 1)), we plot its phase in the (2,1) subplot. 




Fig. 6. Subplots (1,1), (1,2) and (2,2) show log ^(l, 1), log l/o(l, 2)] (same as log |/o(2, 1)|) and log/o(2,2). Subplot (2,1) shows 
arg(/o(2,l)). 




1 2 3 



Fig. 7. Subplots (1,1), (1,2) and (2,2) show log/i(l, 1), log | /i(l, 2)| (same as log |/i(2, 1)|) and log/o(2,2). Subplot (2,1) shows 
arg(/i(2,l)). 
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Fig. 8. Subplots (1,1), (1,2) and (2,2) show log/r(l, l),log ] fr(l, 2)] (same as log |/^(2, 1)|) and log/r(2,2). Subplot (2,1) shows 
arg(/,(2,l)), forrG[0,l]. 



Three dimensional surface show the geodesic connecting /o to /i in Figure [H Here, /r.geodosic is drawn using 

/r,gcodcsic JQ \Jq JIIq ) Jq ■ 

It is interesting to observe the smooth shift of the energy across frequency and directionaUty. 

VIII. Conclusions 

The aim of this study has been to develop multivariable divergence measures and metrics for matrix-valued power 
spectral densities. These are expected to be useful in quantifying uncertainty in the spectral domain, detecting events 
in non-stationary time series, smoothing and spectral estimation in the context of vector valued stochastic processes. 
The spirit of the work follows closely classical accounts going back to HI, lH and proceeds along the lines of 
Q. Early work in signal analysis and system identification has apparently focused only on divergence measures 
between scalar spectral densities, and only recently have such issues on multivariable power spectra attracted 
attention lH, 191. Further, this early work on scalar power spectra was shown to have deep roots in statistical 
inference, the Fisher-Rao metric, and KuUback-Leibler divergence fSJ, 111 page 371], Q, |fT3l . Thus, it is expected 
that interesting connections between the geometry of multivariable power spectra and information geometry will 
be established as well. 



References 

[1] M. Basseville, "Distance measures for signal processing and pattern recognition," Signal processing, vol. 18, no. 4, pp. 349-369, 1989. 
[2] R. Gray, A. Buzo, A. Gray Jr, and Y. Matsuyama, "Distortion measures for speech processing," Acoustics, Speech and Signal Processing, 

IEEE Transactions on, vol. 28, no. 4, pp. 367-376, 1980. 
[3] C. Rao, "Information and the accuracy attainable in the estimation of statistical parameters," Bull. Calcutta Math. Soc, vol. 37, pp. 

81-91, 1945. 
[4] S.-I. Amari and H. Nagaoka, Methods of information geometry. Amer. Math. Soc, 2000. 
[5] N. Cencov, Statistical decision rules and optimal inference. Amer. Math. Soc, 1982, no. 53. 
[6] R. Kass and R Vos, Geometrical foundations of asymptotic inference. Wiley New York, 1997. 
[7] T. Georgiou, "Distance and Riemannian metrics for spectral density functions," Signal Processing, IEEE Transactions on, vol. 55(8), 

pp. 3995-4003, 2007. 
[8] A. Ferrante, C. Masiero, and M. Pavon, "Time and spectral domain relative entropy: A new approach to multivariate spectral estimation," 

Arxiv preprint arXiv: 1103.5602, 2011. 
[9] A. Ferrante, M. Pavon, and F. Ramponi, "Hellinger versus Kullback-Leibler multivariable spectrum approximation," Automatic Control, 

IEEE Transactions on, vol. 53, no. 4, pp. 954-967, 2008. 



JULYS, 2011 21 

[10] T. Georgiou, "Relative entropy and the multivariable multidimensional moment problem," Information Theory, IEEE Transactions on, 
vol. 52, no. 3, pp. 1052-1066, 2006. 

[11] R. Bhatia, Positive definite matrices. Princeton Univ Pr, 2007. 

[12] M. Pinsker, Information and information stability of random variables and processes. Izv. Akad. Nauk. SSSR, Moscow, 1960, English 
translation: San Francisco,CA: Holden-Day, 1964. 

[13] S. Yu and P. Mehta, "The Kullback-Leibler rate pseudo-metric for comparing dynamical systems," Automatic Control, IEEE Transactions 
on, vol. 55, no. 7, pp. 1585-1598, 2010. 

[14] N. Wiener and P. Masani, "The prediction theory of multivariate stochastic processes, Part I," Acta Math., vol. 98, pp. 111-150, 1957. 

[15] P. Masani, Recent trends in multivariable prediction theory. (Krishnaiah, P.R., Editor), Multivariate Analysis, pp. 351-382. Academic 
Press, 1966. 

[16] T Georgiou, "The Caratheodory-Fejer-Pisarenko decomposition and its multivariable counterpart," Automatic Control, IEEE Transac- 
tions on, vol. 52, no. 2, pp. 212-228, 2007. 

[17] P. Petersen, Riemannian geometry. Springer Verlag, 2006. 

[18] J. Jost, Nonpositive curx'ature: geometric and analytic aspects. Birkhauser, 1997. 

[19] M. Moakher, "A differential geometric approach to the geometric mean of symmetric positive-definite matrices," SIAM Journal on 
Matrix Analysis and Applications, vol. 26, no. 3, pp. 735-747, 2005. 

[20] A. Kolmogorov and S. Fomin, Elements of the theory of functions and functional analysis. Volume 2. Graylock Press, 1961. 

[21] R. Horn and C. Johnson, Topics in matrix analysis. Cambridge university press, 1994. 

[22] , Matrix analysis. Cambridge university press, 2005. 



